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Abstract 

We describe our language-independent un- 
supervised word sense induction system. 
This system only uses topic features to 
cluster different word senses in their global 
context topic space. Using unlabeled data, 
this system trains a latent Dirichlet allo- 
cation (LDA) topic model then uses it to 
infer the topics distribution of the test in- 
stances. By clustering these topics dis- 
tributions in their topic space we cluster 
them into different senses. Our hypothesis 
is that closeness in topic space reflects sim- 
ilarity between different word senses. This 
system participated in SemEval-2 word 
sense induction and disambiguation task 
and achieved the second highest V-measure 
score among all other systems. 

1 Introduction 

Ambiguity of meaning is inherent in natural lan- 
guage because the deliverer of words tries to mini- 
mize the size of the vocabulary set he uses. There- 
fore, a sizable portion of this vocabulary is polyse- 
mous and the intended meaning of such words can 
be encoded in their context. 

Due to the knowledge acquisition bottleneck 



problem and scarcity in training data (Cai et 



al., 2007), unsupervised corpus based approaches 



could be favored over supervised ones in word sense 
disambiguation (WSD) tasks. 

Similar efforts in this area include work by Cai 
et al. (Cai et al., 20071 in which they use latent 



Dirichlet allocation (LDA) topic models to extract 
the global context topic and use it as a feature 
along other baseline features. Another technique 
uses clustering based approach with WordNet as an 
external resource for disambiguation without rely- 



ing on training data ( Anaya- Sanchez et al., 2007) 



To disambiguate a polysemous word in a text 
document, we use the document topic distribution 
to represent its context. A document topic distri- 
bution is the probabilistic distribution of a docu- 
ment over a set of topics. The assumption is that: 
given two word senses and the topic distribution 




Figure 1: A graphical model for LDA 

of their context, the closeness between these two 
topic distributions in their topic space is an indi- 
cation of the similarity between those two senses. 

Our motivation behind building this system was 
the observation that the context of a polysemous 
word helps determining its sense to some degree. 
In our word sense induction (WSI) system, we use 
LDA to create a topic model for the given corpus 
and use it to infer the topic distribution of the 
documents containing the ambiguous words. 

This paper describes our WSI system which par- 
ticipated in SemEval-2 word sense induction and 



disambiguation task (Manandhar et al., 2010). 



2 Latent Dirichlet allocation 

LDA is a probabilistic model for a collection of dis- 
crete data (Blei et al., 2003). It can be graphically 



represented as shown in Figure [T] as a three level 
hierarchical Bayesian model. In this model, the 
corpus consists of M documents, each is a multino- 
mial distribution over K topics, which are in turn 
multinomial distributions over words. 

To generate a document d using this probabilis- 
tic model, a distribution over topics 9d is generated 
using a Dirichlet prior with parameter a. Then, 
for each of the Nd words Wdn in the document, 
a topic Zdn is drawn from a multinomial distribu- 
tion with the parameter 6d- Then, a word Wdn is 
drawn from that topic's distribution over words, 
given Pij = p(w = i\z = j) . Where is the proba- 
bility of choosing word i given topic j. 

3 System description 

We wanted to examine the trade-off between sim- 
plicity, cost and performance by building a simple 



language-independent, totally unsupervised, com- 
putationally cheap system and compare its perfor- 
mance to other WSI systems participating in the 
SemEval-2 WSI task (iManandhar et al., 20101). We 



expect a degradation in precision of our simple ap- 
proach as the granularity of senses becomes finer; 
This is due to the degrading sensitivity in mapping 
between the topics space and the senses space. We 
note that our simple approach will fail if multiple 
senses of the same word appear in the same docu- 
ment; Since these senses will be represented by the 
same topic distribution of the document, they will 
be clustered in the same cluster. 

Our system is a language-independent system. 
The used LDA topic model has no knowledge of 
the training or testing corpus language. Unlike 
most other WSI and WSD systems, it doesn't make 
use of part of speech (POS) features which are lan- 
guage dependent and require POS annotated train- 
ing data. The only features used are the topics dis- 
tribution of bag-of-words containing the ambigu- 
ous word. 

First, for each target polysemous word wp (noun 
or verb), we train a MALLET^ parallel topic model 
implementation of LDA on all the training in- 
stances of that word. Then we use the trained topic 
model to infer the topics distribution 6i for each of 
the test instances of that word. For a A"-topics 
topic model, each topics distribution can be repre- 
sented as a point in a A-dimensional topic space. 
These points can be clustered into C different clus- 
ters, each representing a word sense. We used 
MALLET's A-means clustering algorithm with co- 
sine similarity to measure the distance between dif- 
ferent topic distributions in the topic space. 

4 Evaluation measures 

We use the same unsupervised evaluation mea- 



sures used in SemEval-2 (Manandhar and Kla- 



paftis, 2009). These measures do not require de- 



scriptive 

The V-measure is used for unsupervised evalu- 
ation. It is the harmonic mean of the homogene- 
ity and completeness. Homogeneity is a measure 
of the degree that each formed cluster consists of 
data points that belong to a single gold standard 
(GS) class as defined below. 
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Table 1: Effect of varying the number of topics A 
on performance 



K 

V-measurc 
F-score 



10 
8.6 



50 
5.8 
32.0 



200 
7.2 
53.9 



400 500 
~SL4 87T 
63.9 64.2 



Where HQ is an entropy function, |C| and \GS\ 
refer to cluster and class sizes, respectively. N is 
the number of data points, ay are data points of 
class GSi that belong to cluster Cj. 

On the other hand, completeness measures the 
degree that each class consists of data points that 
belong to a single cluster. It is defined as follows. 
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1 http:/ /mallet. cs.umass.edu 



Homogeneity and completeness can be seen as 
entropy based measures of precision and recall, re- 
spectively. The V-measure has a range of (worst 
performance) to 1, inclusive. 

The other evaluation measure is the F-score, 
which is the harmonic mean of precision and re- 
call. It has a range of to 1 (best performance), 
inclusive. 

5 Experiments and results 

The WSI system described earlier was tested on 
SemEval-1 WSI task (task 2) data (65 verbs, 
35 nouns), and participated in the same task in 
SemEval-2 (task 14) (50 verbs, 50 nouns). The 
sense induction process was the same in both cases. 

Before running our main experiments, we 
wanted to see how the number of topics A used in 
the topic model could affect the performance of our 
system. We tested our WSI system on SemEval-1 
data using different A values as shown in Table [T] 
We found that the V-measure and F-score values 
increase with increasing A, as more dimensions are 
added to the topic space, the different senses in this 
A-dimensional space unfold. This trend stops at a 
value of A = 400 in a sign to the limited vocabu- 
lary of the training data. This A value is used in 
all other experiments. 

Next, we evaluated the performance of our sys- 
tem on SemEval-1 WSI task data. Since no train- 
ing data was provided for this task, we used an un- 
annotated version of the test instances to create the 
LDA topic model. For each target word (verb or 
noun) , we trained the topic model on its given test 



Table 2: V-measure and F-score on SemEval-1 
All Verbs Nouns 
V-mcasurc 8^4 870 8.7 
F-score 63.9 56.8 69.0 

Table 3: V-measure and F-score on SemEval-2 
All Verbs Nouns 
V-measure 1577 1271 lKO 
F-score 36.9 54.7 24.6 



instances. Then we used the generated model's in- 
ferencer to find the topics distribution of each one 
of them. These distributions are then clustered in 
the topic space using the if -means algorithm and 
the cosine similarity measure was used to evalu- 
ate the distances between these distributions. The 
results of this experiment are shown in Table [2] 

Our WSI system took part in the main SemEval- 
2 WSI task (task 14). In the unsupervised evalua- 
tion, our system had the second highest V-measure 
value of 15.7 for all wordf0 A break down of the 
obtained V-measure and F-scores is shown in Table 

El 

To analyze the performance of the system, we 
examined the clustering of the target noun word 
"promotion" to different senses by our system. We 
compared it to the GS classes of this word in the 
answer key provided by the task organizers. For 
a more objective comparison, we ran the if -means 
clustering algorithm with K equal to the number 
of GS classes. Even though the number of formed 
clusters affects the performance of the system, we 
assume that the number of senses is known in this 
analysis. We focus on the ability of the algorithm 
to cluster similar senses together. A graphical com- 
parison is given in Figure [2j 

The target noun word "promotion" has 27 in- 
stances and four senses. The lower four rectangles 
in Figure [2] represent the four different GS classes, 
and the upper four rectangles represent the four 
clusters created by our system. Three of the four 
instances representing a job "promotion" (O) were 
clustered together, but the fourth one was clus- 
tered in a different class due to terms like "driv- 
ing," "troops," and "hostile" in its context. The 
offer sense of "promotion" {\/) was mainly split 
between two clusters, cluster 2 which most of its 
instances has mentions of numbers and monetary 
units, and cluster 4 which describes business and 
labor from an employee's eye. 

The 13 instances of the third class which carry 
the sense encourage of the word promotion (□) are 
distributed among the four different clusters de- 
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Figure 2: Analysis of sense clustering 

pending on other topic words that classified them 
as either belonging to cluster 4 (encouragement in 
business), cluster 3 (encouragement in conflict or 
war context), cluster 2 (numbers and money con- 
text), or cluster 1 (otherwise). We can see that the 
topic model is unable to detect and extract topic 
words for the "encourage" sense of the word. Fi- 
nally, due to the lack of enough training instances 
of the sense of a promotional issue of a newspaper 
(s$5), the topic model inferencer clustered it in the 
numbers and monetary cluster because it was rich 
in numbers. 

6 Conclusion 

Clustering the topics distributions of the global 
context of polysemous words in the topic space to 
induce their sense is cheap as it does not require 
any annotated data and is language-independent. 

Even though the clustering produced by our sys- 
tem did not fully conform with the set of senses 
given by the GS classes, it can be seen from the 
analyzed example given earlier that our cluster- 
ing carried some different senses. In one case, a 
GS sense was not captured by the topic model, 
and instead, other cues from its instances context 
were used to cluster them accordingly. The in- 
duced clustering had some noise though. 

This simple WSI approach can be used for cheap 
sense induction or for languages for which no 
POS tagger has been created yet. This system 
which had the second highest V-measure score in 
SemEval-2 WSI task achieves a good trade-off be- 
tween performance and cost. 



2 A complete evaluation of all partic- 
ipating systems is available online at 

http: / /www. cs.york.ac.uk/scmcval2010_WSI/task_14_ranking.ht 



References 

[Anaya-Sanchez et al.2007] Henry Anaya-Sanchez, 
Aurora Pons-Porrata, and Rafael Berlanga- 
Llavori. 2007. Tkb-uo: Using sense cluster- 
ing for wsd. In Proceedings of the Fourth In- 
ternational Workshop on Semantic Evaluations 
u.i (SemEval-2007), pages 322-325, Prague, Czech 



Republic, June. Association for Computational 
Linguistics. 

[Blei et al.2003] David M. Blei, Andrew Y. Ng, and 
Michael I. Jordan. 2003. Latent dirichlet alloca- 
tion. J. Mach. Learn. Res., 3:993-1022. 

[Cai et al.2007] Junfu Cai, Wee Sun Lee, and 
Yee Whye Teh. 2007. Improving word sense 
disambiguation using topic features. In Pro- 
ceedings of the 2001 Joint Conference on Em- 
pirical Methods in Natural Language Processing 
and Computational Natural Language Learning 
( EMNLP- CoNLL ) , pages 1015-1023, Prague, 
Czech Republic, June. Association for Compu- 
tational Linguistics. 

[Manandhar and Klapaftis2009] Suresh Man- 
andhar and Ioannis P. Klapaftis. 2009. 
Semeval-2010 task 14: evaluation setting for 
word sense induction & disambiguation systems. 
In DEW '09: Proceedings of the Workshop on 
Semantic Evaluations: Recent Achievements 
and Future Directions, pages 117-122, Morris- 
town, NJ, USA. Association for Computational 
Linguistics. 

[Manandhar et al.2010] Suresh Manandhar, Ioan- 
nis P. Klapaftis, Dmitriy Dligach, and Sameer S. 
Pradhan. 2010. Semeval-2010 task 14: Word 
sense induction & disambiguation. In Proceed- 
ings of SemEval-2, Uppsala, Sweden. ACL. 



